Short Text Clustering Algorithms, Application and Challenges: A Survey

نویسندگان

چکیده

The number of online documents has rapidly grown, and with the expansion Web, document analysis, or text become an essential task for preparing, storing, visualizing mining documents. texts generated daily on social media platforms such as Twitter, Instagram Facebook are vast unstructured. Most these come in form short need special analysis because suffers from lack information sparsity. Thus, this topic attracted growing attention researchers data storing processing community knowledge discovery. Short clustering (STC) a critical automatically grouping various unlabelled into meaningful clusters. STC is necessary step many applications, including Twitter personalization, sentiment spam filtering, customer reviews other network-related applications. In last few years, natural-language-processing research concentrated attempted to overcome problems sparseness, dimensionality, information. We comprehensively review approaches proposed literature. Providing insights technological component should assist identifying possibilities challenges facing STC. To gain insights, we literature, journals, academic papers focusing techniques. contents study prepared by reviewing, analysing summarizing diverse types journals scholarly articles focus techniques five authoritative databases: IEEE Xplore, Web Science, Science Direct, Scopus Google Scholar. This focuses techniques: clustering, texts, pre-processing, representation, dimensionality reduction, similarity measurement evaluation.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Survey of Text Clustering Algorithms

Clustering is a widely studied data mining problem in the text domains. The problem finds numerous applications in customer segmentation, classification, collaborative filtering, visualization, document organization, and indexing. In this chapter, we will provide a detailed survey of the problem of text clustering. We will study the key challenges of the clustering problem, as it applies to the...

متن کامل

Text Clustering Algorithms: A Review

With the growth of Internet, large amount of text data is increasing, which are created by different media like social networking sites, web, and other informatics sources, etc. This data is in unstructured format which makes it tedious to analyze it, so we need methods and algorithms which can be used with various types of text formats. Clustering is an important part of the data mining. Clust...

متن کامل

Survey of Text Clustering

Clustering text documents into different category groups is an important step in indexing, retrieval, management and mining of abundant text data on the Web or in corporate information systems. Text clustering task can be intuitively described as finding, given a set vectors of some data points in a multi-dimensional space, a partition of text data into clusters such that the points within each...

متن کامل

A Survey on Text Based Clustering

Clustering is the main technique for data analysis and it deals with the organisation of a set of objects in a multidimensional space into cohesive groups called clusters. Every cluster contains closely related objects and has very dissimilar objects in other clusters. Cluster analysis aims at discovering the objects with same behaviour in a collection. Thus, if an object satisfies a rule, the ...

متن کامل

Parallel Clustering Algorithms: Survey

Clustering is grouping input data sets into subsets, called ’clusters’ within which the elements are somewhat similar. In general, clustering is an unsupervised learning task as very little or no prior knowledge is given except the input data sets. The tasks have been used in many fields and therefore various clustering algorithms have been developed. Clustering task is, however, computationall...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Applied sciences

سال: 2022

ISSN: ['2076-3417']

DOI: https://doi.org/10.3390/app13010342